/*This "master" do file highlights the steps and code that needs to be run to construct the bank health data used in 
“The Effects of Changes in Local Bank Health on Household Consumption” by Daniel Cooper and Joe Peek.  Not all steps can
be run with the code and data provided because some confidential data is required that cannot be posted.  Conditional on obtaining
this data, the code can be run and we have provided information on how to obtain it.

Stata packages might need to be installed:
	-renvars	====>>  net install dm88_1.pkg
	-tsspell	====>>  ssc install tsspell
	-isvar		====>>  ssc install isvar
*/

/*
Abbreviations used below:
BH - Bank Health
CB - Commercial Bank
SSB - State Savings Bank
SOD - Summary of Deposits
Call - Call Report Data


Variable naming conventions:
"L" - Bank Classified as Local Bank 
"M" - Bank Classified as Multi-Locational Bank 
"05" - The threshold at which the bank is classified as local or multi
"10" - The threshold at which the bank is classified as local or multi 
"_refix" - variables or filenames that are affected by the FFIEC 034 Form Filer issue (discussed in bhreadme.docx)
*/
*=====================================================================================================================================================
*=====================================================================================================================================================

local do_dir "<set path to do files here>" 	
	
cd "`do_dir'"

*-------------------------------------------------------------------------------------
*##  01a: INPUT RAW CALL REPORT DATA (unable to provide raw data)
*-------------------------------------------------------------------------------------

*<<Notes on replicating data :
	/*
	** We use historical call report data compiled from internal sources. We are unable to provide ANY data from these sources.
		However, interested parties can run the following do files on commercial bank call report data, which are available from 
		the Chicago FRB and FFIEC: 
			-https://www.chicagofed.org/banking/financial-institution-reports/commercial-bank-data 
			-see bhreadme.docx for more information 
		
	** If state savings bank (SSB) data (which are used in this analysis and not available from the link above) are desired, interested parties have 
	   two options:
		1) Fill out a Freedom of Information Act Request - you may want to contact basd@frb.gov for help with this
		2) Alternatively, one would have to:
			2a) Go to this site: https://cdr.ffiec.gov/public/ManageFacsimiles.aspx 
			2b) Perform institution-by-institution searches with the list of rssd-ids for the relevant SSBs

	** bh2020-sub-01a-createCallData.do provides a call report variable list, and is one possible option for processing the raw quarterly Call Report data 
	files. Of course, alternative approaches can be employed (and may be required) depending on the source of your data. 	

	** the output file from this step (bh-01a-callAppended.dta) should be a bank-level, quarterly time series, from 1984:Q1 through the present.
		-more detail within do file *-01a-* as well as in bhreadme.docx
	*/

*---------------------------------------------------------------
*##########  01b: PREP Call Report DATA
*---------------------------------------------------------------
*<<Data Notes:
	/*
	This step: 
	1) Constructs call report sample of banks/entities
		--keeps only CBs/ SSBs
		--drops US territories
		--drops banks with missing assets, etc. 
	2) Constructs relevant bank balance sheet measures (financial ratios):
		-Total & Real Estate NonPerforming Loan ratios 
		-Total & Real Estate NonPerforming Assets ratios
		
	-more detail within the do file below and in bhreadme.docx-
	
	*/	
	*<<Input Data FileName: RawCallReportData.dta 

		do "`do_dir'/bh2020-sub-01b-prepCallData.do"
		
	*>>Output Data FileName: bh-01b-call.dta
	
	

*---------------------------------------------------------------
*##########  01c: RAW DATA PREP - Summary of Deposits (SOD) - 1984-Q1 through current.

/*Data source note:

	**We are again unable to provide raw data for the SOD data used here as it is compiled from confidential internal sources.
		These sources are based on a data sharing agreement with the FDIC and thus, access must be granted directly from the FDIC.
		
	**However, data starting in 1994 are publicly available from the FDIC: https://www7.fdic.gov/sod/dynaDownload.asp?barItem=6
	  The FDIC does not provide data electronically before 1994, however, interested parties can contact the FDIC and request historical data, a FOIA 
	  request may or may not be necessary.
	  
	**Deposit data prior to 1994 are present in the SOD data obtained from our internal source - (National Information Center [NIC] database)
		Note: Public SOD data and the SOD data used in this analysis contain identical bank charter types.
			However they differ in two respects:
				1) As stated above, the length of available history is shorter in the public data
				2) The variable names differ between sources:
					-For more detail on navigating the naming conventions from different source's, see bhdocumentation.xlsx (tab 'MapVarNames-Btwn-SOD-Sources')
					
*/


*---------------------------------------------------------------
*<<Data Notes:
/*
This step:
** Cleans the SOD sample
	--keeps only call report filers
	--drops banks/branches with missing deposits
	--removes US territories, etc.

	-more detail within and in bhreadme.docx-
*/

	*<<Input Data FileNames [Deposits]: RawSOD_SUMDTable | RawSOD_SUMSTable.dta [or public FDIC csvs]
	*<<Input Data FileNames [Geocode Checks]: list1.xls (copy statement provided for file download) | listOfCBSAInPSID (not publicly available)
		
		do "`do_dir'/bh2020-sub-01c-prepDepositsData.do"
		
	*>>Output Data FileName:  bh-01c-SOD-wGeocodes.dta
		



*---------------------------------------------------------------------------------------------
*##########  02a: JOIN Call Report (step 01b) &  SOD (step 01c) data to make master data file
*    (file contains bank health and deposit data by location)
*---------------------------------------------------------------------------------------------

*<<Data Notes:
/*
This step:
	1) Merges each year/quarter of data from the call report database into the annual SOD data (converts SOD data to quarterly.) */
	
/*		
	2) Implements the bank minimum rule for MSAs/CBSAs 
		- The rule says:
				"where an MSA mostly has enough banks, make sure there are at least 4 years in a row where they have an acceptable number of banks."
	
	
	-more detail in the below do file and in bhreadme.docx-

*/
	
	*<<Input Data FileNames: bh-01b-call.dta | bh-01c-SOD-wGeocodes.dta

		do "`do_dir'/bh2020-sub-02a-mergeCallSOD.do"

	*>>Output Data FileName: bh-02a-masterCallSOD.dta
		


*----------------------------------------------------------------------------------
*##########  03a: SPLIT MASTER DATA FILE BY LOCATION - (INCLUDES TREATEMENT OF 034 FORM FILERS ('_REFIX') )
*----------------------------------------------------------------------------------
/*<< Data Notes:

*/



/*
This step:
1) Constructs total deposits by entity prior to splitting data by location. 
	- This total serves as denominator for "location-to-entity" deposit shares, which will be used to create groups of Local and Multi-Locational 'entities'.
	- See bhreadme.docx for more on the Local/Multi-locational classifications

2) Splits the master data file (generated in 02a) into 6 separate datasets:
	-3 files that include the entire time series of bank health data, split by location:
		--bh-03a-MSA* (MSA BH measures)
		--bh-03b-nMSA* (nonMSA BH measures)
		--bh-03c-ST* (State BH measures)
		
	-3 files that:
		-include only pre-1985q2 data
		-exclude banks that file the FFIEC 034 Form 
			-(these banks do not report the relevant real estate line items---see bhreadme.docx for further details)
			
		filenames:
		--bh-03d-MSA*_refix.dta
		--bh-03e-nMSA*_refix.dta
		--bh-03f-ST*_refix.dta 
*/

	*<<Input Data FileNames: bh-02a-masterCallSOD.dta

		do "`do_dir'/bh2020-sub-03a-splitDatasetsByLocation.do"

	*<<Output Data FileNames:
	/*
	bh-03a-MSA-Locations.dta | bh-03b-nMSA-Locations.dta | bh-03c-ST-Locations.dta 
	bh-03d-MSA-Locations_refix.dta | bh-03e-nMSA-Locations_refix.dta | bh-03f-ST-Locations_refix.dta	
	

	*/

	

*--------------------------------------------------------------------------------
*##########  04a: CONSTRUCT (DEPOSIT WEIGHTED) BANK HEALTH MEASURES BY LOCATION
*--------------------------------------------------------------------------------
*<<Data Notes:

/* 		
	**  Both the input and output data in this step retain the [date/location/bank-id] aggregation, despite measures produced here 
		by date/location. This counterintuitive structure was useful for validation / error checking, 
		but alternative methods can certainly be employed.
*/	

/*

This step:

	1) Creates the deposit-weighted (overall) bank health measures for each location
		--Main percentile-based measures (50th, 25th, 10th percentiles)
	    --Weighted averaged measures 

	2) Creates Local/Multi-Locational 'entity-types' 
		--based on the "location-to-entity" deposit share, banks are grouped in various ways according to their relationship to a given location.
		--see bhreadme.docx and do file below (04a) for details on this process.
	
*/
	
	*<<Input Data FileNames: 
		//bh-03a-MSA-Locations.dta | bh-03b-nMSA-Locations.dta | bh-03c-ST-Locations.dta 
		//bh-03d-MSA-Locations_refix.dta | bh-03e-nMSA-Locations_refix.dta | bh-03f-ST-Locations_refix.dta

		do "`do_dir'/bh2020-sub-04a-makeBhMeasures.do"

	*<<Output Data FileNames:			
		/*
		bh-04a-MSA-Locations-wavgsAndPercentiles.dta 		| bh-04b-nMSA-Locations-wavgsAndPercentiles.dta 		| 	bh-04c-ST-Locations-wavgsAndPercentiles.dta
		bh-04d-MSA-Locations_refix-wavgsAndPercentiles.dta 	| bh-04e-nMSA-Locations_refix-wavgsAndPercentiles.dta 	|	bh-04f-ST-Locations_refix-wavgsAndPercentiles.dta
		*/	

	

*--------------------------------------------------------------
*##########  05a: FINAL SETUP AND DATA CLEANING
*--------------------------------------------------------------


/*
This step:
	1)  Does some renaming / labeling of variables based on 'entity-type' (Local/Multi, 5/10 threshold, bank-id/bhc-id - see above / bhreadme.docx)
	2)  Orders variables by conventions above, so like-'entity-types' are grouped together.
		-- (e.g., all measures for Local banks -defined as such at the 5% 'location-to-bank' threshold, i.e., L05ent_* --should be grouped together) 
	3)  Creates an indicator variable for locations the have no real estate NPL measures, due to being comprised ONLY of 034 filers prior to 1985Q3. 
*/

	
	*<<Input Data FileNames:
		/*
		bh-04a-MSA-Locations-wavgsAndPercentiles.dta | bh-04b-nMSA-Locations-wavgsAndPercentiles.dta | bh-04c-ST-Locations-wavgsAndPercentiles.dta
		bh-04d-MSA-Locations_refix-wavgsAndPercentiles.dta | bh-04e-nMSA-Locations_refix-wavgsAndPercentiles.dta | bh-04f-ST-Locations_refix-wavgsAndPercentiles.dta
		*/	
	
		do "`do_dir'/bh2020-sub-05a-reCombineAndFinalCleaning.do"
	
	
	*>>Output Data Filenames:
		/*
		bh-05a-allBhMeasuresByMSA
		bh-05a-allBhMeasuresBynMSA
		bh-05a-allBhMeasuresByST
		*/






